Lifei/fixed accumulated token count#6587
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes an issue with accumulated token count in streaming responses from OpenAI-compatible providers. The changes extract and update usage information from streaming chunks, ensuring that usage data with output tokens is correctly captured and yielded only once.
Changes:
- Added
extract_usage_with_output_tokenshelper function to filter usage data that includes output tokens - Modified streaming response handler to accumulate usage from chunks during tool call processing
- Refactored tests with helper functions and added comprehensive test coverage for multiple providers (OpenRouter, OpenAI GPT-5, Tetrate Claude)
| let tool_chunk: StreamingChunk = serde_json::from_str(line) | ||
| .map_err(|e| anyhow!("Failed to parse streaming chunk: {}: {:?}", e, &line))?; | ||
|
|
||
| if let Some(chunk_usage) = extract_usage_witqh_output_tokens(&tool_chunk) { |
There was a problem hiding this comment.
Typo in function name: "extract_usage_witqh_output_tokens" should be "extract_usage_with_output_tokens".
| if let Some(chunk_usage) = extract_usage_witqh_output_tokens(&tool_chunk) { | |
| if let Some(chunk_usage) = extract_usage_with_output_tokens(&tool_chunk) { |
| #[tokio::test] | ||
| async fn test_streamed_multi_tool_response_to_messages() -> anyhow::Result<()> { | ||
| let response_lines = r#" | ||
| let responqse_lines = r#" |
There was a problem hiding this comment.
Typo in variable name: "responqse_lines" should be "response_lines".
| let responqse_lines = r#" | |
| let response_lines = r#" |
| assert_eq!(usage.usage.total_tokens, Some(expected_total)); | ||
| } | ||
|
|
||
| #[tokio::test] |
There was a problem hiding this comment.
I haven't looked at the details since this just feels like a large dump, but not sure we need three tests here to verify that we are now doing the right thing?
There was a problem hiding this comment.
The bug is caused by different structures of the streaming chunks in api response. So I added the tests with the test data with variations for different providers. This will provide harness when we changing the logic in the future.
jamadeo
left a comment
There was a problem hiding this comment.
Is it the same across all providers @lifeizhou-ap ? You mention databricks/claude but since this code is in formats/openai.rs I'd be worried if we're missing some specific logic per-provider for token tracking
Yes, I was worrying about this too including the last PR. I manually tested with different providers again. Also I added the chunk payload in new tests to make sure future change won't break this. I am not sure whether I understand your question correctly, happy to have a chat! |
…ovider * 'main' of github.com:block/goose: increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531)
* main: increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531) Fix path for global agent skills (#6591) recipes: add mcp server (#6552) feat(gcp-vertex): add model list with org policy filtering (#6393) chore: encourage extension searching (#6582) blog: mobile apps consolidation and roadmap (#6580) chore: remove unused dependencies in cargo.toml (#6561) resolved all the extensions to load in cli (#6464)
* main: increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492)
* main: increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492)
* main: (41 commits) chore: tweak release docs (#6571) fix(goose): propagate session_id across providers and MCP (#6584) increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531) Fix path for global agent skills (#6591) recipes: add mcp server (#6552) feat(gcp-vertex): add model list with org policy filtering (#6393) chore: encourage extension searching (#6582) blog: mobile apps consolidation and roadmap (#6580) chore: remove unused dependencies in cargo.toml (#6561) ...
* main: (68 commits) fix(docs): use dynamic import for globby ESM module (#6636) chore: trigger CI Document tab completion (#6635) Install goose-mcp crate dependencies (#6632) feat(goose): standardize agent-session-id for session correlation (#6626) chore: tweak release docs (#6571) fix(goose): propagate session_id across providers and MCP (#6584) increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531) Fix path for global agent skills (#6591) ...
* main: (68 commits) fix(docs): use dynamic import for globby ESM module (#6636) chore: trigger CI Document tab completion (#6635) Install goose-mcp crate dependencies (#6632) feat(goose): standardize agent-session-id for session correlation (#6626) chore: tweak release docs (#6571) fix(goose): propagate session_id across providers and MCP (#6584) increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531) Fix path for global agent skills (#6591) ...
* main: (68 commits) fix(docs): use dynamic import for globby ESM module (#6636) chore: trigger CI Document tab completion (#6635) Install goose-mcp crate dependencies (#6632) feat(goose): standardize agent-session-id for session correlation (#6626) chore: tweak release docs (#6571) fix(goose): propagate session_id across providers and MCP (#6584) increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531) Fix path for global agent skills (#6591) ...
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Summary
Fix wrong accumulated token count for databricks claude models.
Why
For Databricks Claude models, the usage is in all the chunks with same
prompt_token. But only the last one with finish_reason hascompletion_token.With the change in this PR #6493, for each api,
Actual: we adds
prompt_tokenin all the chunksExpected: we should only add once, which can use the chunk with finish_reason.
What
completion_token. Ifcompletion_tokenis null, this means we don't have the usage yet.Type of Change
AI Assistance
Testing
Unit tests and manual tests